Ensembles of Multi-Scale VGG Acoustic Models
نویسندگان
چکیده
We present our work on constructing multi-scale deep convolutional neural networks for automatic speech recognition. Several VGG nets have been trained that differ solely in the kernel size of the convolutional layers. The general idea is that receptive fields of varying sizes match structures of different scales, thus supporting more robust recognition when combined appropriately. We construct a large multi-scale system by means of system combination. We use ROVER and the fusion of posterior predictions as examples of late combination, and knowledge distillation using soft labels from a model ensemble as a way of early combination. In this work, distillation is approached from the perspective of knowledge transfer pretraining, which is followed by a fine-tuning on the original hard labels. Our results show that it is possible to bundle the individual recognition strengths of the VGGs in a much simpler CNN architecture that yields equal performance with the best late combination.
منابع مشابه
Efficient Knowledge Distillation from an Ensemble of Teachers
This paper describes the effectiveness of knowledge distillation using teacher student training for building accurate and compact neural networks. We show that with knowledge distillation, information from multiple acoustic models like very deep VGG networks and Long Short-Term Memory (LSTM) models can be used to train standard convolutional neural network (CNN) acoustic models for a variety of...
متن کاملImproved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features
On small datasets, discriminatively trained bottleneck features from deep networks commonly outperform more traditional spectral or cepstral features. While these features are typically trained with small, fully-connected networks, recent studies have used more sophisticated networks with great success. We use the recent deep CNN (VGG) network for bottleneck feature extraction—previously used o...
متن کاملEffect of porosity on the characteristics of underwater acoustic sound absorbers using theoretical models
Porous materials have good acoustic damping characteristics over a wide frequency range. As for sound waves, many small-scale pores in the coating materials can convert underwater-coating to rough surfaces. The main property of porous absorbents is their resistance against incident sound wave that leads to damping effect. From a physical point of view, damping occurs due to friction between flu...
متن کاملUse of multi-model ensembles from global climate models for assessment of climate change impacts
Multi-model ensembles of climate predictions constructed by running several global climate models for a common set of experiments are available for impact assessment of climate change. Multi-model ensembles emphasize the uncertainty in climate predictions resulting from structural differences in the global climate models as well as uncertainty due to variations in initial conditions or model pa...
متن کاملChapter 2 : Neuronal ensemble coding of birdsong
In many models of neural population coding, similar sensory or motor states are represented in the brain by similar neural ensembles (Georgopoulos et al., 1999; Lewis and Kristan, 1998; Wilson and McNaughton, 1993). To explore this issue, we measured the activity of large numbers of single neurons in the pre-motor nucleus RA of the singing zebra finch. During singing, individual RA neurons gene...
متن کامل